Introduction

Row

Topic Introduction

As someone who spends a lot of time on social media, I think it’s really interesting to see the different ways people can use websites like Twitter. One of the many uses of social media, like Twitter, is for spreading awareness about social movements. This was seen especially in the last year, with floods of information about topics such as the COVID-19 pandemic, LGBTQ rights, and racial justice. The BLack Lives Matter movement especially took off on social media after the unjust murders by police of Breonna Taylor and George Floyd, sparking nationwide protests throughout the summer. Although Black Lives Matter was founded and began spreading the phrase in 2013, it took off in 2020. Unfortunately, not everyone in our country agrees with this and several counterarguments have been created, such as “Blue Lives Matter” in reference to the police and “All Lives Matter”.

Project Inspiration

My inspiration for this project came from seeing a tweet that had a map of the United States and was comparing the prevalence of a certain topic in tweets around the country. I thought that was really interesting, and wanted to see if it was something I would be able to do. I am also interested in using data to make a difference, and thought that using a political topic would be a way to get into that.

Row

Data Collection

I collected the data myself by pulling tweets from the Twitter API. To do this, I had to apply for a Twitter developer account and get my project approved. Once my project was approved, I could start pulling the tweets. Since this project was based on locations, I wanted to be able to specify 100 miles within a certain geocode (latitude and longitude) from each state. My first idea was to use the capital of each state, but I thought I could run into some bias since capital cities are often more liberal than other areas. Instead, I used a website to randomly generate a city in each state and from there used Google Maps to find the geocode of that city. This did provide some of its own problems, since some states returned around 900 tweets and other states returned 4. The original scope of the project was to look at the #BlackLivesMatter and #AllLivesMatter, but I switched over to looking at just the phrases after getting a very small amount of data from #AllLivesMatter.

Once I had a geocode for each state, I pulled the tweets by using the ‘search_tweet’ command from the ‘rtweet’ package. The final version of the data is from tweets that I pulled on 4/26/2021, which is important since my API access only allows access to tweets within the previous 7 days of running the command. I ran the command twice for each state, once with ‘black lives matter’ and once with ‘all lives matter’. I probably could have used a loop to make this more efficient, but I wanted to be able to see how many tweets were being returned from each individual state and make sure to not exceed the number of times I could ping the Twitter server. After I got the individual state data sets I combined them into two data sets, one for the ‘black lives matter’ tweets and another for the ‘all lives matter’ tweets. The search_tweets function returns a lot of information about the tweets, so I then created a ‘tidy’ version of each data set including only the variables I thought could be important to my analysis. I also removed the screen name related variables to provide some anonymity to the users and so that I could feel more comfortable about sharing my actual data set. Although these users did publicly tweet, I did not feel comfortable having all of the information that I have access to being public to anyone I share the package with. The final versions of the data can be accessed in the package using “BLMTweets” or “ALMTweets”.

Analysis Background

The main analysis that I did was sentiment analysis. I wrote a function that takes in either the “black lives matter” data set or the “all lives matter” data set and returns a data set with sentiment information by state. This function involves breaking the tweets down into words, removing stop words (“the”, “and”, etc.), and matching the remaining words to the “bing” lexicon and assigning them to be either positive or negative. I then calculated a total score for each state by doing the number of positive words minus the number of negative words. A positive score would mean that a state has more positive than negative words associated with its collected tweets. I also calculate the average sentiment of each state by dividing the total sentiment score by the number of tweets that were included in the sentiment analysis. There are some tweets that get completely excluded from sentiment analysis because they contain only stop words or are just tagging another user, so those tweets do not get counted towards the average sentiment score.

Expected Outcomes

My hypothesis was that states that are typically more liberal would have a positive sentiment associated with “black lives matter” tweets, and a negative sentiment associated with “all lives matter” tweets. For states that are typically more conservative, I expected the opposite.

Black Lives Matter

Row

Chart 1

Chart 2

Row

Chart 3

Chart 4

Row

Analysis

From the plot of total sentiment score by state, we can see that all of the total scores are negative. This implies that the tweets using “black lives matter” are more negative than they are positive. This plot also shows some surprising states as being the most negative, which is due to the large amount of data from the state. Looking at the plot of average sentiment score by state, we see that Vermont is on average the least negative, and South Dakota is on average the most negative. Most of states fall in the light green range, with an average score between around -0.5 to just over -1. There do not appear to be any geological trends. Alaska is not included in the data because it returned 0 observations, and Nebraska only returned 3 that were removed when joining the lexicon.

The two bar plots show the 10 most used positive and negative words in tweets with “black lives matter”. The negative words have much higher counts than the positive words, supporting that the tweets tend to be more negative. I find this especially interesting because the words that are shown on the negative word bar plot tend to be associated with information that black lives matter activists are trying to spread. It is also interesting that “trump” is considered to be a positive word according to the lexicon…

All Lives Matter

Row

Chart 1

Chart 2

Row

Chart 3

Chart 4

Row

Analysis

From the plot of total sentiment score by state, we can see that almost all of the states are negative with North Dakota being positive. North Dakota only has one tweet with “all lives matter”, but the tweet has a positive sentiment score of 1. We can also see that the totals are not as negative as the “black lives matter” totals were, which is due to the smaller amount of data available. Looking at the plot of average sentiment score by state, we can see again that North Dakota is the only state that has a positive average sentiment and that Minnesota has the most negative average sentiment. There do not appear to be any geological trends. Alaska, Montana, and Nebraska are not included in the data because they either returned 0 observations or returned observations that were removed when the lexicon was joined. It is interesting that this data has one more missing state than the “black lives matter” data.

From the counts of the positive and negative words, we can see that there are less words overall than for the ‘black lives matter’ words. However, there are still more negative words than positive words in the tweets with ‘all lives matter’. The negative words in ‘all lives matter’ contain two swear words, where ‘black lives matter’ did not have any. Both the positive and negative words in the ‘all lives matter’ tweets are very similar to the those in the tweets with ‘black lives matter’.

Comparison

Row

Chart 1

Chart 2

Row

Table 1

Positive Words Used in Both Datasets

Table 2

Negative Words Used in Both Datasets

Row

Analysis

In the chart of the difference in total sentiment between tweets containing “black lives matter” and “all lives matter”, the difference for each state is calculated as “black lives matter” score minus “all lives matter” score. A negative number shows that the “black lives matter” tweets are more negative than the “all lives matter” tweets. For example, Washington had a total score of -271 for the “black lives matter” tweets and a total score of -176 for the “all lives matter” tweets. The total difference score is -145 (-271 - -176), which shows that the “black lives matter” tweets are more negative by 145. A positive number shows that the “all lives matter” tweets were more negative than the “black lives matter” tweets. For example, Wyoming had a total score of -4 for the “black lives matter” tweets and a total score of -6 for the “all lives matter” tweets. The total difference score is 2 (-4 - -6), which shows that the “all lives matter” tweets are more negative by 2.

The next plot is similar, and shows the difference in the average sentiment of the tweets. A negative value shows that on average, tweets with “black lives matter” are more negative than tweets with “all lives matter”. A positive values hows that on average, tweets with “all lives matter” are more negative than tweets with “black lives matter”. North Dakota stands out as being the most negative, and Minnesota stands out as being the most positive.

The next component of this page is to allow others to explore some of the words being used most often in these tweets. The tables have every positive and negative word used in both tweets with “black lives matter” and “all lives matter”.

From the maps we can see that there does not seem to be much association between the sentiment of tweets and geological location. To further research this, more data collection at varying dates and locations within a state may be beneficial. It would also be interesting to see if there were further ways to decide what the actual score of the tweet is. For example, “guilty” is automatically is counted has a negative word, but a tweet itself announcing that Derek Chauvin was found guilty is actually spreading positive information.